Random Walks for Text Semantic Similarity
نویسندگان
چکیده
Many tasks in NLP stand to benefit from robust measures of semantic similarity for units above the level of individual words. Rich semantic resources such as WordNet provide local semantic information at the lexical level. However, effectively combining this information to compute scores for phrases or sentences is an open problem. Our algorithm aggregates local relatedness information via a random walk over a graph constructed from an underlying lexical resource. The stationary distribution of the graph walk forms a “semantic signature” that can be compared to another such distribution to get a relatedness score for texts. On a paraphrase recognition task, the algorithm achieves an 18.5% relative reduction in error rate over a vector-space baseline. We also show that the graph walk similarity between texts has complementary value as a feature for recognizing textual entailment, improving on a competitive baseline system.
منابع مشابه
The Sheffield and Basque Country Universities Entry to CHiC: Using Random Walks and Similarity to Access Cultural Heritage
The Cultural Heritage in CLEF 2012 (CHiC) pilot evaluation included these tasks: ad-hoc retrieval, semantic enrichment and variability tasks. At CHiC 2012, the University of Sheffield and the University of the Basque Country submitted a joint entry, attempting the three English monolingual tasks. For the ad-hoc task, the baseline approach used the Indri Search engine. Query expansion approaches...
متن کاملText Understanding using Knowledge-Bases and Random Walks
One of the key challenges for creating the semantic representation of a text is mapping words found in a natural language text to their meanings. This task, Word Sense Disambiguation (WSD), is confounded by the fact that words have multiple meanings, or senses, dictated by their use in a sentence and the domain. We present an algorithm that employs random walks over the graph structure of knowl...
متن کاملWikiWalk: Random walks on Wikipedia for Semantic Relatedness
Computing semantic relatedness of natural language texts is a key component of tasks such as information retrieval and summarization, and often depends on knowledge from a broad range of real-world concepts and relationships. We address this knowledge integration issue with a method of computing semantic relatedness using personalized PageRank (random walks) on a graph derived from Wikipedia. T...
متن کاملRobust Named Entity Disambiguation with Random Walks
Named Entity Disambiguation is the task of assigning entities from a Knowledge Graph (KG) to mentions of such entities in a textual document. The state-of-the-art for this task balances two disparate sources of similarity: lexical, defined as the pairwise similarity between mentions in the text and names of entities in the KG; and semantic, defined through some graph-theoretic property of a sub...
متن کاملMeasuring semantic relatedness with vector space models and random walks
Both vector space models and graph randomwalk models can be used to determine similarity between concepts. Noting that vectors can be regarded as local views of a graph, we directly compare vector space models and graph random walk models on standard tasks of predicting human similarity ratings, concept categorization, and semantic priming, varying the size of the dataset from which vector spac...
متن کامل